RegEx multiline matching [Solved]

Hi all,

Main question, I have a text file, with some content that closely resembles HTML markup (because it is) however, the way the text is arranged I can extract some information from it. The following is the content of the file (literally). The exact same text.

<div><b>EmployeeName:</b> Luckas Duckins</div>
<div><b>CCName:</b> Mike McMice</div>
<div><b>CCEmail:</b> MikeMcMice@funinc.com</div>
<div><b>ExpirationDate:</b> 7/17/2015</div>

I have a script that was working last Friday, but when I went back today to keep working on it, I got no match, <strike>so I wonder what is it that I was doing last Friday that I did not do today</strike>. Script as follows:

$MyPath = "c:\Path\to\textfile.txt"
$regex99 = @'
(?ms)<div><b>EmployeeName:<\/b> (.+?)</div>
<div><b>CCName:<\/b> (.+?)<\/div>
<div><b>CCEmail:<\/b> (.+?)<\/div>
<div><b>ExpirationDate:<\/b> (.+?)<\/div>
'@

[IO.File]::ReadAllText($MyPath) -match $regex99
if ([IO.File]::ReadAllText($Mypath) -match $regex99)
  {
   $EmployeeName = $matches[1]
   $CCName = $matches[2] 
   $CCEmail = $matches[3] 
   $ExpirtationDate = $matches[4] 
  }
"EmpName"
$EmployeeName 
"CC Name"
$CCName 
"CC Email"
$CCEmail 
"EXP Date"
$ExpirtationDate 

#output was
#True
#EmpName 
#Luckas Duckins
#CC Name
#Mike McMice
#CC Email
#MikeMcMice@funinc.com
#EXP Date
#7/17/2015

<strike>Right now I just get a big False</strike>. I suspect the issue may be regarding the file itself. I resaved the file (after adding a new line at the end of the file), and the script worked. Then, I removed the new line, and the script works. If I try either of the following regex, each one works, but I am trying to get it on one go.

$regex99 = @'
(?ms)<div><b>EmployeeName:<\/b>\s(.+?)<\/div>
'@

$regex99 = @'
(?ms)<div><b>CCName:<\/b>\s(.+?)<\/div>
'@

$regex99 = @'
(?ms)<div><b>CCEmail:<\/b>\s(.+?)<\/div>
'@

I have used https://mjolinor.wordpress.com/2012/01/05/powershell-multiline-regex-matching/ as a reference, as well as a post I found on Stackoverflow <strike>(cannot find it anymore :( )</strike>

Any help is appreciated.

UPDATE:

Found the post on Stackoverflow that I used as reference. http://stackoverflow.com/questions/15375921/powershell-parse-parts-of-a-text-file-and-save-to-csv

UPDATE 2:

I kept working on the script and I modified the text file, so basically after resaving the file the script worked.

Background about the text file. I get the text content from another script, I save the text on the text file, then I read the file to process it.

Is it possible to save the text to a variable, and keep the text as a here string o I can process it?


July 20th, 2015 3:51pm

Just an update. I am not using the text file anymore. However, this still applies. I get the information from the HTML page, and then I run a replace on it to match the newLines.
$feedURL = "http://website.com/feed/getfeed/" #sample url for AtomFeed
#property object
$property = New-Object System.Collections.Specialized.OrderedDictionary
$property.Add('UseDefaultCredentials', $true)

#I get the AtomFeed specific property that I  need
$result = ((New-Object Net.Webclient -Property $property ).DownloadString($feedURL) -as [xml]).rss.channel.item[0].description.InnerText 

#I replace the new line with the newline that matches my OS (Windows)
$result = $result.Replace("`n","`r`n")

#Then I run the former script
$regex = @'
(?ms)<div><b>EmployeeName:<\/b> (.+?)</div>
<div><b>CCName:<\/b> (.+?)<\/div>
<div><b>CCEmail:<\/b> (.+?)<\/div>
<div><b>ExpirationDate:<\/b> (.+?)<\/div>
'@

#reset variables
$EmployeeName, $CCName, $CCEmail, $ExpirtationDate = $null
#check if there are matches
$result -match $regex
#get the values I want
if ($resultHere -match $regex)
  {
   $EmployeeName = $matches[1]
   $CCName = $matches[2] 
   $CCEmail = $matches[3] 
   $ExpirtationDate = $matches[4] 
  }

$EmployeeName 
$CCName 
$CCEmail 
$ExpirtationDate 
That works for me.
Free Windows Admin Tool Kit Click here and download it now
July 20th, 2015 8:19pm

Just an update. I am not using the text file anymore. However, this still applies. I get the information from the HTML page, and then I run a replace on it to match the newLines.
$feedURL = "http://website.com/feed/getfeed/" #sample url for AtomFeed
#property object
$property = New-Object System.Collections.Specialized.OrderedDictionary
$property.Add('UseDefaultCredentials', $true)

#I get the AtomFeed specific property that I  need
#in the production script I would loop through all .rss.channel.item[n] instances
#to capture the results
$result = ((New-Object Net.Webclient -Property $property ).DownloadString($feedURL) -as [xml]).rss.channel.item[0].description.InnerText 

#I replace the new line with the newline that matches my OS (Windows)
$result = $result.Replace("`n","`r`n")

#Then I run the former script
$regex = @'
(?ms)<div><b>EmployeeName:<\/b> (.+?)</div>
<div><b>CCName:<\/b> (.+?)<\/div>
<div><b>CCEmail:<\/b> (.+?)<\/div>
<div><b>ExpirationDate:<\/b> (.+?)<\/div>
'@

#reset variables
$EmployeeName, $CCName, $CCEmail, $ExpirtationDate = $null
#check if there are matches
$result -match $regex
#get the values I want
if ($resultHere -match $regex)
  {
   $EmployeeName = $matches[1]
   $CCName = $matches[2] 
   $CCEmail = $matches[3] 
   $ExpirtationDate = $matches[4] 
  }

$EmployeeName 
$CCName 
$CCEmail 
$ExpirtationDate 
That works for me.

July 20th, 2015 8:19pm

jrv,

Thank you for the feedback. I did try a script similar to yours, for some reason I could not get it to behave as I expected it. So I ended up sticking with the process I posted above.

In another turn on this matter, due to some limitations regarding data availability and possible change in the pattern of the data (as it is generated by the server), I shifted research to access the information through a ConnectionString using http://powershell.com/cs/blogs/tobias/archive/2011/03/01/accessing-data-bases.aspx as reference. Then I run an SQL command to get the information we wanted as an object.

I have made progress with regards to accessing the data, we are no just ironing out the UI to add the information to the database, which will happen through SharePoint 2010.

Thanks for your help and time in this matter.



  • Edited by Mr. Potter III Thursday, July 23, 2015 1:49 PM Clarification
Free Windows Admin Tool Kit Click here and download it now
July 23rd, 2015 1:47pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics